NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

https://doi.org/10.1145/3643540

Xu, Xuhai; Yao, Bingsheng; Dong, Yuanzhe; Gabriel, Saadia; Yu, Hong; Hendler, James; Ghassemi, Marzyeh; Dey, Anind K; Wang, Dakuo (March 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.
more » « less
Full Text Available
Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines

https://doi.org/10.18653/v1/2022.acl-long.222

Gabriel, Saadia; Hallinan, Skyler; Sap, Maarten; Nguyen, Pemi; Roesner, Franziska; Choi, Eunsol; Choi, Yejin (January 2022, ACL)

Full Text Available
Social Bias Frames: Reasoning about Social and Power Implications of Language

https://doi.org/10.18653/v1/2020.acl-main.486

Sap, Maarten; Gabriel, Saadia; Qin, Lianhui; Jurafsky, Dan; Smith, Noah A; Choi, Yejin (July 2020, Association for Computational Linguistics)

Full Text Available
Detecting and Tracking Communal Bird Roosts in Weather Radar Data

https://doi.org/10.1609/aaai.v34i01.5373

Cheng, Zezhou; Gabriel, Saadia; Bhambhani, Pankaj; Sheldon, Daniel; Maji, Subhransu; Laughlin, Andrew; Winkler, David (June 2020, Proceedings of the AAAI Conference on Artificial Intelligence)

The US weather radar archive holds detailed information about biological phenomena in the atmosphere over the last 20 years. Communally roosting birds congregate in large numbers at nighttime roosting locations, and their morning exodus from the roost is often visible as a distinctive pattern in radar images. This paper describes a machine learning system to detect and track roost signatures in weather radar data. A significant challenge is that labels were collected opportunistically from previous research studies and there are systematic differences in labeling style. We contribute a latent-variable model and EM algorithm to learn a detection model together with models of labeling styles for individual annotators. By properly accounting for these variations we learn a significantly more accurate detector. The resulting system detects previously unknown roosting locations and provides comprehensive spatio-temporal data about roosts across the US. This data will provide biologists important information about the poorly understood phenomena of broad-scale habitat use and movements of communally roosting birds during the non-breeding season.
more » « less
Full Text Available
The Risk of Racial Bias in Hate Speech Detection

Sap, Maarten; Card, Dallas; Gabriel, Saadia; Choi, Yejin; Smith, A. Noah (August 2019, ACL)

We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and ratings of toxicity in several widely used hate speech datasets. Then, we show that models trained on these corpora acquire and propagate these biases, such that AAE tweets and tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. Finally, we propose dialect and race priming as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.
more » « less
Full Text Available

Search for: All records